Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

An Elegant Algorithm for the Construction of Suffix Arrays.

Identifieur interne : 001D49 ( Main/Exploration ); précédent : 001D48; suivant : 001D50

An Elegant Algorithm for the Construction of Suffix Arrays.

Auteurs : Sanguthevar Rajasekaran [États-Unis] ; Marius Nicolae [États-Unis]

Source :

RBID : pubmed:25045344

Abstract

The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of O(n2). The problem of designing practically and theoretically efficient techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the ℓ-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of O(n log n). RadixSA introduces an idea that may find independent applications as a speedup technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zip.

DOI: 10.1016/j.jda.2014.03.001
PubMed: 25045344


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">An Elegant Algorithm for the Construction of Suffix Arrays.</title>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2">
<nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName>
<region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
<affiliation wicri:level="2">
<nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName>
<region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2014">2014</date>
<idno type="RBID">pubmed:25045344</idno>
<idno type="pmid">25045344</idno>
<idno type="doi">10.1016/j.jda.2014.03.001</idno>
<idno type="wicri:Area/PubMed/Corpus">001905</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001905</idno>
<idno type="wicri:Area/PubMed/Curation">001905</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001905</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001A01</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001A01</idno>
<idno type="wicri:Area/Ncbi/Merge">000E43</idno>
<idno type="wicri:Area/Ncbi/Curation">000E43</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000E43</idno>
<idno type="wicri:doubleKey">1570-8667:2014:Rajasekaran S:an:elegant:algorithm</idno>
<idno type="wicri:Area/Main/Merge">001D64</idno>
<idno type="wicri:Area/Main/Curation">001D49</idno>
<idno type="wicri:Area/Main/Exploration">001D49</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">An Elegant Algorithm for the Construction of Suffix Arrays.</title>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation wicri:level="2">
<nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName>
<region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
<affiliation wicri:level="2">
<nlm:affiliation>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Dept. of Computer Science and Engineering, Univ. of Connecticut, Storrs, CT</wicri:regionArea>
<placeName>
<region type="state">Connecticut</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of discrete algorithms (Amsterdam, Netherlands)</title>
<idno type="ISSN">1570-8667</idno>
<imprint>
<date when="2014" type="published">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of
<i>O</i>
(
<i>n</i>
<sup>2</sup>
). The problem of designing practically and theoretically efficient techniques remains open. In this paper we present an elegant algorithm for suffix array construction which takes linear time with high probability; the probability is on the space of all possible inputs. Our algorithm is one of the simplest of the known SACAs and it opens up a new dimension of suffix array construction that has not been explored until now. Our algorithm is easily parallelizable. We offer parallel implementations on various parallel models of computing. We prove a lemma on the ℓ-mers of a random string which might find independent applications. We also present another algorithm that utilizes the above algorithm. This algorithm is called RadixSA and has a worst case run time of
<i>O</i>
(
<i>n</i>
log
<i>n</i>
). RadixSA introduces an idea that may find independent applications as a speedup technique for other SACAs. An empirical comparison of RadixSA with other algorithms on various datasets reveals that our algorithm is one of the fastest algorithms to date. The C++ source code is freely available at http://www.engr.uconn.edu/~man09004/radixSA.zip.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Connecticut</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Connecticut">
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
</region>
<name sortKey="Nicolae, Marius" sort="Nicolae, Marius" uniqKey="Nicolae M" first="Marius" last="Nicolae">Marius Nicolae</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D49 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001D49 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:25045344
   |texte=   An Elegant Algorithm for the Construction of Suffix Arrays.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:25045344" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021